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IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 


APPLICANT: 
SERIAL NO.: 
FILED: 
TITLE: 

ART UNIT: 
EXAMINER: 


BERGER, Jens 
to be assigned 
herewith 

METHOD FOR INSTRUMENTAL VOICE QUALITY 
EVALUATION 

not yet known 

not yet known 


Assistant Commissioner for Patents 
Washington, D.C. 20231 


Sir: 

PRELIMINARY AMENDMENT 
Please amend the above-identified application before a first consideration on the merits a 
follows: 


IN THE DRAWINGS 

Please replace Figs. 2a, 2b and 3 with the amended Figs. 2a, 2b and 3 submitted herewith. 
IN THE TITLE 

Please amend the title to read --METHOD FOR DETERMINING SPEECH QUALITY 
USING OBJECTIVE MEASURES-. 


-i- 


09/530389 
526Rec'dPCT/PT0 27 APR 2000 

IN THE SPECIFICATION 

On page 1, line 1, change "Preliminary Remarks" to - Field of the Invention- . 

On page 1, line 3, before "invention" insert -present-. 

On page 1, line 6, change "(undisturbed signal)" to -, or undisturbed signal-. 

On page 1, before line 8, insert -- Related Technology- . 

On page 1, line 8, change "Usually" to -Typically--. 

On page 1, delete line 23. 

On page 2, line 27, before "Fig. 1" insert -See--. 

On page 3, delete line 12. 

On page 4, line 6, change "published in:" to -See--. 
On page 4, delete line 21 . 

On page 4, before line 24, insert - Summary of the Invention- . 
On page 4, line 24, change "The object" to -An object-. 
On page 4, delete line 30. 

On page 5, line 1, change "in the invention described here" to -according to the present 
invention—. 

On page 6, line 1, change "part of the" to -aspect of the present-. 
On page 6, delete line 26. 
On page 6, before line 28, insert - 
Brief Description of the Drawings 

Fig. 1. shows a flow chart depicting a prior art calculation of a quality value; 

Fig. 2a shows a flow chart depicting a calculation of a quality value using a spectral weighting function; 
Fig. 2b shows a flow chart depicting a calculation of a quality value using an inverted spectral weighting 
function; and 

Fig. 3 shows a flow chart depicting a calculation of a Telecommunication Objective Speech Quality 
Assessment (TOSQA) using a spectral weighting function. 
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Detailed Description -. 

On page 6, line 28, change "A special exemplary embodiment is shown by an implementation 
according" to -An embodiment of the present invention is now described with reference-. 

On page 6, line 29, after "which" insert -shows a flowchart depicting a calculation of a so- 
called— and delete "is known as". 

On page 7, line 3, change "In specification of to -Following- and change "2b, speech" to - 
2b, but with more specificity, reference speech signal 2 and the speech signal to be assessed 4 are 
segmented (see blocks 6 and 8, respectively). Speech--. 

On page 7, line 4, after "detector" insert -(see block 10)-. 

On page 7, line 5, delete both occurrences of "the", after "reference speech signal" insert -2- 
and after "assessed" insert -4-. 

On page 7, line 6, after "filter" insert -(see blocks 14 and 16, respectively)-. 

On page 7, line 7, change "handset." to -handset (see blocks 18 and 20, respectively). The 
weighting function W T (f) is applied to the reference speech signal before the bandpass filtering (see 
block 12).-. 

On page 7, line 9, after "loudness" insert -(see blocks 22 and 24, respectively)--. 
On page 7, line 17, after "1982)" insert - which is hereby incorporated by reference herein--. 
On page 1, line 20, after "function" insert --(see block 26)-- and after "value" insert 
--TOSQA-.. 

On page 7, line 22, after "segments" insert --(see block 28)-. 

On page 8, line 1, change "Patent Claims" to -WHAT IS CLAIMED IS:-. 

IN THE CLAIMS 

Please cancel without prejudice claims 1-6 and add new claims 7-14 as follows: 

~7. (new) A method for determining speech quality using an objective measure, the method 
comprising: 


calculating a speech quality characteristic value by comparing respective spectral short-time 
properties of an assessed speech signal and of a reference speech signal; 

prior to the comparing the respective spectral short-time properties, reducing differences in 
respective mean spectral envelopes of the assessed speech signal and of the reference speech signal by 
weighting spectral short-time properties of the assessed speech signal and the reference speech signal in 
a predetermined number of time segments using a spectral weighting function so as to include 
differences in the respective mean spectral envelopes in the speech quality characteristic value to a 
limited extent, the spectral weighting function being calculated from the respective mean spectral 
envelopes; and 

calculating a respective intensity value for each of a plurality of frequency bands in a signal 
segment respectively for the assessed speech signal and the reference speech signal using variable limits 
for the frequency bands so that a respective difference between each calculated respective intensity of 
the assessed speech signal and the reference speech signal is reduced. 

8. (new) The method as recited in claim 7 wherein the respective difference between each 
calculated respective intensity of the assessed speech signal and the reference speech signal is a 
respective minimum. 

9. (new) The method as recited in claim 7 further comprising, before the reducing the differences 
in the respective mean spectral envelopes and the calculating the respective intensity, calculating the 
respective mean spectral envelopes of the assessed speech signal and the reference speech signal in the 
form of respective mean power density spectra and wherein the calculating of the spectral weighting 
function is performed using respective quotients of the respective mean power density spectra and 
wherein a short-time power density spectrum of the reference speech signal is weighted with the 
spectral weighting function before calculating the speech quality characteristic value. 


10. (new) The method as recited in claim 7 further comprising, before the reducing the differences 


in the respective mean spectral envelopes and the calculating the respective intensity, calculating the 
respective mean spectral envelopes of the assessed speech signal and the reference speech signal in the 
form of respective mean power density spectra and wherein the calculating of the weighting function is 
performed for partial regions of the calculated respective mean spectral envelopes so that the reducing 
differences in the mean spectral envelopes occurs only in partial spectral regions. 

1 1 . (new) The method as recited in claim 7 wherein the calculating of the respective intensity value 
for each of the plurality of frequency bands is performed before the calculating the quality 
characteristics value and is performed by integrating a respective signal intensity, the width of the 
frequency bands being constant on a pitch scale and further comprising calculating a respective specific 
loudness from the respective intensity values in the respective frequency bands, the limits for the 
frequency bands being selected so that differences in the calculated respective specific loudnesses 
between the assessed signal and the reference speech signal are a respective minimum in each 
frequency band in the signal segment. 

12. (new) The method as recited in claim 7 wherein the calculating of the speech quality 
characteristic value is performed based on a similarity of respective spectral representations of the 
assessed speech signal and the reference speech signal in a plurality of time segments, the respective 
similarity representing a respective correlation coefficient between the respective spectral 
representations of the assessed speech signal and the reference speech signal in a respective time 
segment of the plurality of time segments averaged over the plurality of time segments. 

1 3 . (new) The method as recited in claim 12 wherein the respective spectral representations 
include the respective spectral short-time properties. 

14. (new) The method as recited in claim 12 wherein the respective correlation coefficient is 
calculated from a subset of the respective spectral representations-. 


IK THE ABSTRACT 

Please delete lines 1-8. 

Line 9, change "rates, are erroneously evaluated. In" to -In a method for determining speech 
quality using an objective measure, in--. 
Line 11, delete "On". 

Line 12, change "the other hand," to -Additionally-. 


This Preliminary Amendment cancels original claims 1-6 and adds new claims 7-14. The new 
claims do not add new matter to the application but do conform the claims to U.S. Patent and 
Trademark Office rules. 

The amendments to the specification, abstract and drawings are to conform the specification, 
abstract and drawings to U.S. Patent and Trademark Office rules. It is respectfully submitted that the 
amendments to the specification, abstract and drawings do not introduce new matter into the 
application. 

The underlying PCT application includes a Search Report, a copy of which is included 


REMARKS 


herewith. 


Conclusion 


Consideration of the present application as amended is hereby respectfully requested. 


Respectfully Submitted, 


Kenyon & Kenyon 




Richard L. Mayer 
(Reg. No. 22,490) 


One Broadway 
New York, NY 10004 
Tel. (212)425-7200 
Fax. (212) 425-5288 
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[2345/127] 

METljQDJQRJDETERMININGJPEECH QUAL ITY 
USING OBJECTIVE MEASURE^ 


Preliminary Remarks 

The invention relates to a method for determining speech quality using objective measures, 
in which characteristic values for determining speech quality are derived by comparing 
properties of a speech signal to be assessed to properties of a reference speech signal 
(undisturbed signal). 

Usually, the quality of speech signals is determined through auditory ("subjective") tests by 
test persons. 

The aim of objective methods for determining speech quality is to ascertain, with the aid of 
suitable calculation methods, characteristic values from the properties of the speech signal to 
be assessed, the characteristic values describing the speech quality of the speech signal to 
be assessed, without having to resort to the judgments of test persons. 

The calculated characteristic values and the underlying method for determining speech 
quality using objective measures are regarded as acknowledged if a high correlation with the 
results of auditory reference tests is achieved. Consequently, the speech- quality values 
obtained by auditory tests represent the target values which are to be achieved by objective 
methods. 

Related Art 



Known methods for determining speech quality using objective measures are based on 

a comparison of a reference speech signal to the speech signal to be assessed. In this 
context, the reference speech signal and the speech signal to be assessed are segmented 
into short time segments. The spectral properties of the two signals are compared in these 
segments. 

Various approaches and models are used to calculate the spectral short-time properties. 
Generally, the signal intensity is calculated in frequency bands whose width becomes greater 
with increasing mid-frequency. Examples of such frequency bands are the known third- 
octave bands or frequency groups according to Zwicker (published in Zwicker, E. : 
"Psychoakustik" ["Psychoacoustics"], Berlin: Springer Publishing House, 1982). 

The spectral intensity representation thus calculated for each time segment considered can 
be viewed as a series of numerical values, in which the number of individual values 
corresponds to the number of frequency bands used, the numerical values themselves 
represent the calculated intensity values, and a consecutive index of the frequency bands 
describes the sequence of the numerical values. 

In the methods presently known for determining speech quality using objective measures, 
the limits of the frequency bands utilized are kept constant on the frequency axis. 

In each time segment under consideration, the calculated intensities of the speech signal to 
be assessed and of the reference speech signal are compared to each other in each band. 
The difference of both values, or the similarity of the two resulting spectral intensity 
representations, constitutes the basis for the calculation of a quality value (Fig. 1). 

Such methods were developed in particular for the qualitative assessment of speech in 
telephone applications. Examples thereof are the publications: 
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"A perceptual speech-quality measure based on a psychacoustic sound representation" 
(Beerends, J. G.; Stemerdink, J. A., J. Audio Eng. Soc. 42(1994)3, pp. 115-123) 

"Auditory distortion measure for speech coding" (Wang, S; Sekey, A.; Gersho, A.: IEEE 
Proc. Int. Conf. acoust, speech and signal processing (1991), pp.493-496). 

The presently valid ITU-T standard P. 861 likewise describes such a method: "Objective 
quality measurement of telephone-band speech codecs" (ITU-T Rec. P. 861, Geneva 
1996). 


Disadvantages of Known Objective Speech-Quality Measurement Methods 

The use of known methods for determining speech quality using objective measures fails 
with respect to the reliability of the calculated quality values for certain signal properties to 
be assessed. Presently known methods furnish only unreliable quality values in particular 
when the speech signal to be assessed is impaired, such as in the case of impairments 
caused by speech coding methods with low bit rates or combinations of different 
disturbances. 


In such cases, the presently known methods have the disadvantage that, given a comparison 
between the speech signal to be assessed and a reference speech signal, the quality 
characteristic value to be calculated includes differences between the two signal segments in 
the selected representation plane which either do not lead or scarcely lead to a qualitative 
impairment, not even one which is perceptible in the auditory test. 

Within the framework of the transmission of speech in telephone applications that is being 
discussed here, frequency-band limitations and spectral deformations of the speech signal to 
be assessed (caused, for example, by filter properties of the telephone device or of the 
transmission channel) contribute only to a limited extent to a perceived qualitative 


impairment. 


To partially prevent such deficiencies, an attempt is made in a different approach to 
compensate for the linear distortions (frequency response) by a correction filter or a power- 
transmission function (published in: "A new approach to objective quality-measures 
based on attribute-matching", Halka, U.; Heute, U., Speech communication, 11(1992)1, 
pp. 15-30). However, the use of this method is disadvantageous in the case of nonlinear 
and time-invariant transmission, since the compensation function thus calculated no longer 
exclusively describes the spectral deformations of the signal to be assessed. 

In known methods, displacements of spectral short-time maxima ("formant displacements") 
in the signal under test in relation to the reference speech signal caused, for example, by 
coding systems with low bit rates, lead to large differences in the spectral intensity 
representations and therefore have a great influence on the calculated quality value. 
However, investigations have revealed that, in an auditory speech-quality test, these 
displacements of spectral short-time maxima have only a limited influence on the quality 
judgment. 

Object 

The object of the invention is to reduce the influence of spectral limitations and deformations 
of the speech signal to be assessed, as well as the influence of displacements of spectral 
short-time maxima, prior to comparing the spectral properties of a signal to be tested to a 
reference speech signal, and prior to the calculation of a quality value using objective 
methods. 

Achievement 


In contrast to known approaches, in the invention described here, a spectral weighting 
function is generated which is based on mean spectral envelopes, e.g., the mean spectral 
power density, of the speech signal to be assessed and the reference speech signal. This 
permits the use of the method in the case of nonlinear and time-variant transmission as well. 

The spectral weighting function is calculated from the quotients of the given values of the 
mean spectral power density of the signal to be assessed Phi y (f) and that of the input signal 
of the transmission system Phi x (f), such that the weighting function can be described via 

W T (f) = a(f)(Phi y (f)/Phi x (f)). 

The assessment function a(f) can weight the weighting function W T (f) differently over the 
range of effect, being constant at 1 in the simplest case. 

The spectral weighting function W T (f) thus calculated brings the mean spectral envelopes of 
the speech signal to be assessed and the reference speech signal closer to each other, so 
that differences of the two spectral envelopes are included only to a reduced extent in the 
calculated quality value. 

The spectral weighting function W T (f) can be applied, firstly, to the reference speech signal. 
In this context, the reference speech signal, in its mean spectral power density, is made to 
approximate the signal to be assessed (Fig. 2a). 

Secondly, the spectral weighting function can be applied, inverted, to the signal to be 
assessed. The distortion of the latter is thereby eliminated and, with regard to its mean 
spectral power density, it is made to approximate the reference speech signal (Fig. 2b). 

A further part of the invention relates to the correction of displacements of spectral short- 
time maxima which are caused by the transmission systems. 


The intensity is integrated for each time segment in frequency bands. The result is a series 
of intensity values for each spectral representation of a signal segment, each individual value 
representing the intensity in a frequency band. In this connection, the displacements of 
spectral short-time maxima may lead to different calculated intensities in the frequency 
bands of the reference speech signal and the speech signal to be assessed. 

These differences in the spectral intensity representations - caused by displacements of 
spectral short-time maxima - can be reduced by a variable arrangement of the frequency 
bands on the frequency axis. In contrast to the constant band limits in known methods, the 
band limits are displaced on the frequency axis. However, the number of frequency bands 
and their index remain constant. In an optimization loop, those band limits are then 
accepted at which the two resulting spectral representations of speech signal to be assessed 
and reference speech signal exhibit maximum similarity, or whose difference is minimal. This 
optimization is carried out for all bands in all time segments under consideration. 

The use of variable band limits to calculate the spectral intensity representation is not 
restricted only to the signal in which the described spectral weighting function W T (f) is also 
used, but may also be applied to the other respective signal and even to both signals (see 
Fig. 2a and 2b). 

Exemplary Embodiment: 

A special exemplary embodiment is shown by an implementation according to Fig. 3, which 
is known as TOSQA (Telecommunication Objective Speech Quality Assessment). In this 
case, an expanded preprocessing of the reference speech signal is carried out. 

In specification of the general implementations according to Fig. 2a and 2b, speech pauses 
are detected here by a speech-pause detector and are not included in the quality measure. 
Likewise, the reference speech signal and the speech signal to be assessed are filtered with 
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a 300 ... 3400 Hz bandpass filter, and there is also filtering to the frequency response of a 
telephone handset. The integration of the spectral power density is carried out in frequency 
groups which represent the basis for the calculation of the specific loudness. 

5 However, the integration in frequency groups is not carried out in fixed frequency-group 

limits, but with the variable frequency-group limits described in the present invention. The 
calculated signal powers in the frequency groups thus modified form the basis for the 
intensity calculation. Use was made here of a model for calculating the specific loudness 
according to Zwicker, an aurally compensated intensity representation (published in 
10 Zwicker, E.: "Psychoakustik" ["Psycho acoustics"], Berlin: Springer Publishing House, 

1982). 

As an addition to the general approach, the calculated loudness patterns are supplemented 
H by an error assessment function. The calculated quality value is formed via a mean value of 

d3 5 the correlation coefficients of the specific loudness for each short time segment under 

~ consideration over the number of evaluated speech segments. 
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Patent Claims 

1 . A method for determining speech quality using objective measures, in which 

characteristic values for determining speech quality are calculated by comparing spectral 
short-time properties of a speech signal to be assessed to a reference speech signal, 
characterized in that, prior to comparing the properties of the speech signals, differences in 
mean spectral envelopes are reduced by first calculating from them a spectral weighting 
function with which the spectral short-time properties of the speech signals in all time 
segments under consideration are weighted, so that the differences in the mean spectral 
envelopes are thereby included only to a limited extent in the quality characteristic value to 
be calculated; and that the limits of the frequency bands used are made variable for 
calculating the signal intensity, so that, for each signal segment under consideration in all 
evaluated frequency bands, the calculated intensities of the reference speech signal and the 
signal to be assessed differ as little as possible from each other. 

2. The method as recited in Claim 1, 

characterized in that, first of all, the mean spectral envelopes of the speech signal to be 
assessed and the reference speech signal are calculated in the form of a mean power density 
spectrum, and a spectral weighting function W T (f) is calculated from the quotients of both 
spectra, the short-time power density spectra of the reference speech signal being weighted 
with said spectral weighting function W x (f) prior to calculating a quality characteristic value. 

3 . The method as recited in Claims 1 and 2, 

characterized in that the weighting function W T (f) to be calculated is calculated only from 
partial regions of the calculated mean spectral envelopes of the speech signal to be assessed 
and the reference speech signal and, consequently, the differences in mean spectral 
envelopes between both signals are reduced only in partial spectral regions. 

4. The method as recited in Claims 1 through 3, characterized in that, prior to 
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calculating the quality characteristic values, there is an integration of the signal intensity for 
each evaluated short time segment in frequency groups, the limits of the frequency groups 
being variable on the frequency axis, but the width of the frequency groups remaining 
constant on the pitch scale, and that the specific loudness is calculated from the signal 
intensities in the frequency groups, the limits of those frequency groups being used in which 
the calculated differences in the specific loudness between the signal to be assessed and the 
reference speech signal exhibit the smallest difference in the band and time segment under 
consideration. 

5. The method as recited in Claims 1 through 4, characterized in that the quality 
characteristic value is calculated from the similarity of the spectral representations in each 
time segment under consideration, the similarity representing a correlation coefficient, 
averaged over all time segments under consideration, between the spectral representation of 
the speech signal to be assessed and the spectral representation of the reference speech 
signal in the respective time segment. 

6. The method as recited in Claim 5, 

characterized in that the correlation coefficient between the spectral representation of the 
speech signal to be assessed and the spectral representation of the reference speech signal 
in the respective time segment is calculated from only a partial region of the spectral 
representation, i.e. not all calculated spectral values are taken into consideration for the 
calculation of the quality characteristic value. 
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Abstract 


Known methods for instrumental voice quality evaluation based on comparing signal 
intensities of the voice signal to be evaluated with a reference voice signal do not optimally 
evaluate spectral distortions in the voice signal to be evaluated so that quality evaluation is 
unreliable. Moreover, by integrating the signal intensity in the frequency bands with constant 
band limits, certain falsifications of the voice signal to be evaluated, such as those caused, 
for instance, by coding systems with lower bit rates, are erroneously evaluated. In order to 
enhance prediction reliability of the evaluated quality parameters, distortions of the mean 
spectral envelope are extensively corrected with a weighting function W T (f) before 
comparing spectral properties. On the other hand, the fixed band limits for integration of 
spectral power density are suppressed and other band limits are searched for instead in a 
predetermined optimization area in which the resulting spectral intensity representations of 
the voice signal to be evaluated and the reference voice signal have maximum similarity. 
The solutions described can supplement known methods and can be incorporated into their 
structures. 
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U.S. DEPARTMENT OF COMMERCE 
PATENT AND TRADEMARK OFFICE 

DECLARATION AND POWER OF ATTORNEY 

ATTORNEY'S DOCKET 
NO. 

2345/127 


As a below named inventor, I hereby declare that: 


My residence, post office address, and citizenship are as stated below next to my name, 

I believe I am an original, first, and j oint inventor of the subj ect matter that is claimed and for 
which apatent is sought on the invention entitled METHOD FOR INSTRUMENTAL VOICE 
QUALITY EVALUATION, the specification of which was filed as International Application No . 
PCT/EP99/05972 on 14 August 1999. 

I hereby state that I have reviewed and understand the contents of the above identified 
specification, including the claims. 

I acknowledge the duty to disclose information which is material to the examination of this 
application in accordance with Title 37, Code of Federal Regulations, § 1 .56(a). 

PRIOR FOREIGN APPLICATION'S) 

I hereby claim foreign priority benefits under Title 3 5 , United States Code, § 1 1 9 of any foreign 
application^) for patent or inventor's certificate listed below andhavealsoidentifiedbelowany foreign 
application for patent or inventor's certificate having a filing date before that of the application on which 
priority is claimed: 


COUNTRY 

APPLICATION 

DATE OF FILING 

DATE OF ISSUE 

PRIORITY CLAIMED 


NUMBER 

(day, month, year) 

(day, month, year) 

UNDER 35 U.S.C. § 119 

Germany 

198 40 548.0 

27 August 1998 


YES 

POWER OF ATTORNEY: As a named inventor, I hereby appoint the following attorneys: 

Richard L. Mayer (Reg. No. 22,490) 



Erik R. Swanson (Reg. No.jUL,833)_ 



SEND CORRESPONDENCE, AND DIRECT TELEPHONE CALLS TO: 



Richard L. Mayer 




KENYON & KENYON 




One Broadwav 




New York, New Yorkl0p_04 




(212) 425-7200 (phone) 




(212) 425-5288 (facsimile) 
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I declare that all statements made herein of my own knowledge are true and all statements 
made on information and belief are believed to be true; and further that these statements were made 
with the knowledge that willful false statements and the like so made are punishable by fine or 
imprisonment, or both, under § 1 00 1 of Title 1 8 of the United States Code and that such willful 
statements may jeopardize the validity of the application or any patent issuing thereon. 


FULL NAME 
OF 

INVENTOR 

FAMILY NAME 

J3ERGER 

FIRST GIVEN NAME 

Jens 

SECOND GIVEN NAME 

RESIDENCE 
& 

CITIZENSHIP 

CITY & ZIP CODE w 

■ 

D-10405 Berlin 

STATE OR FOREIGN COUNTRY 

Germany 

COUNTRY OF CITIZENSHIP 

Germany 

POST OFFICE 
ADDRESS 

POST OFFICE ADDRESS 

Raabestrasge 8 

CITY & ZIP CODE 

D-10405 Berlin 

STATE OR FOREIGN COUNTRY 

Germany 
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